-
Notifications
You must be signed in to change notification settings - Fork 31.1k
🚨 [v5] Toggle the serialization format in processors #41474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨 [v5] Toggle the serialization format in processors #41474
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Let's remove the option to save in legacy format though, it does not really make sense to have it anymore as we want to save with better format starting with v5!
|
Done, completely removed all legacy options for saving and in tests. CI is failing for unrelated reasons |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks a lot for removing all the legacy! Very happy to start saving with this new format!
| output_chat_template_file_legacy = os.path.join( | ||
| save_directory, LEGACY_PROCESSOR_CHAT_TEMPLATE_FILE | ||
| ) # Legacy filename | ||
| output_chat_template_file_legacy = os.path.join(save_directory, LEGACY_PROCESSOR_CHAT_TEMPLATE_FILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this one as well? I see we already save the jinja one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for @Rocketknight1 to decide if we can delete that totally. I think we're converging to one format for templates, though not sure when the legacy can be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think legacy saving can be dropped in v5 but legacy loading needs to stay, because a lot of checkpoints are still in the old format
|
[For maintainers] Suggested jobs to run (before merge) run-slow: align, auto, chinese_clip, clipseg, flava, granite_speech, layoutlmv2, layoutlmv3, markuplm, mgp_str, oneformer, owlvit, speech_to_text, speecht5, vision_text_dual_encoder, wav2vec2 |
* toggle the serialization * prob this fixes it * fix tests * typo * delete legacy save entirely * remove extra nesting in if * revert test and serialzie a public attr instead of private
What does this PR do?
As per title, in v5 we will start saving in the new nested format. The loading is still supported though for old-format configs
Also, we don't need an
optional_attributesfield that does not do what it is called. Since it is onlychat template, let's just manually pop and set it. For audio tokenizers, it'd be nice to refactor a bit, I think we could try to move it inself.attributesFixes #40447 by deleting ambiguously named
optional_attributes